{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "fcd145fa-10d4-4597-9250-1c61984fc5bb",
      "metadata": {
        "id": "fcd145fa-10d4-4597-9250-1c61984fc5bb"
      },
      "source": [
        "# Copyright 2024 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "201cd5fa-25e0-4bd7-8a27-af1fc85a12e7",
      "metadata": {
        "id": "201cd5fa-25e0-4bd7-8a27-af1fc85a12e7"
      },
      "source": [
        "This section shows you how to upload Vectors into a new Weaviate Collection and run simple search queries using the official Weaviate client. In this example, you use a dataset from a CSV file that contains a list of books in different genres. Weaviate will serve as a search engine."
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Install **kubectl** and the **Google Cloud SDK** with the necessary authentication plugin for Google Kubernetes Engine (GKE)."
      ],
      "metadata": {
        "id": "m6xkFsP9lANt"
      },
      "id": "m6xkFsP9lANt"
    },
    {
      "cell_type": "code",
      "source": [
        "%%bash\n",
        "\n",
        "curl -LO \"https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl\"\n",
        "sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl\n",
        "apt-get update && apt-get install apt-transport-https ca-certificates gnupg\n",
        "curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /usr/share/keyrings/cloud.google.gpg\n",
        "echo \"deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main\" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list\n",
        "apt-get update && sudo apt-get install google-cloud-cli-gke-gcloud-auth-plugin"
      ],
      "metadata": {
        "id": "N1HivL_jlEL-"
      },
      "id": "N1HivL_jlEL-",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "**Replace** \\<CLUSTER_NAME> with your cluster name, e.g. weaviate-cluster. Retrieve the GKE cluster's credentials using the gcloud command."
      ],
      "metadata": {
        "id": "v7x8MfDRl9Z9"
      },
      "id": "v7x8MfDRl9Z9"
    },
    {
      "cell_type": "code",
      "source": [
        "%%bash\n",
        "\n",
        "export KUBERNETES_CLUSTER_NAME=<CLUSTER_NAME>\n",
        "gcloud container clusters get-credentials $KUBERNETES_CLUSTER_NAME --region $GOOGLE_CLOUD_REGION"
      ],
      "metadata": {
        "id": "W0tJ9DpImOP_"
      },
      "id": "W0tJ9DpImOP_",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Download the dataset from Git."
      ],
      "metadata": {
        "id": "I77EqyFcmcn5"
      },
      "id": "I77EqyFcmcn5"
    },
    {
      "cell_type": "code",
      "source": [
        "%%bash\n",
        "\n",
        "export DATASET_PATH=https://raw.githubusercontent.com/epam/kubernetes-engine-samples/Weaviate/databases/weaviate/manifests/02-notebook/dataset.csv\n",
        "curl -s -LO $DATASET_PATH"
      ],
      "metadata": {
        "id": "F8Zy_NIzmeJR"
      },
      "id": "F8Zy_NIzmeJR",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Create an .env file with environment variables required for connecting to Weaviate in a Kubernetes cluster."
      ],
      "metadata": {
        "id": "sNn04YZ-m9pZ"
      },
      "id": "sNn04YZ-m9pZ"
    },
    {
      "cell_type": "code",
      "source": [
        "%%bash\n",
        "\n",
        "echo WEAVIATE_ENDPOINT=$(kubectl get svc weaviate-ilb -n weaviate --output jsonpath=\"{.status.loadBalancer.ingress[0].ip}\") > .env\n",
        "echo APIKEY=$(kubectl get secret apikeys -n weaviate --template={{.data.AUTHENTICATION_APIKEY_ALLOWED_KEYS}} | base64 -d) >> .env\n",
        "echo PALM_APIKEY=$(gcloud auth print-access-token) >> .env"
      ],
      "metadata": {
        "id": "uZnC1EZPm-7g"
      },
      "id": "uZnC1EZPm-7g",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "id": "51247bbb-a52f-4003-9596-439f60f3b3c9",
      "metadata": {
        "id": "51247bbb-a52f-4003-9596-439f60f3b3c9"
      },
      "source": [
        "Install a Weaviate client:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "c77e963b-c9ea-47a9-bdbb-029372838364",
      "metadata": {
        "id": "c77e963b-c9ea-47a9-bdbb-029372838364"
      },
      "outputs": [],
      "source": [
        "! pip install weaviate-client python-dotenv"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "320f0cb6-61c9-42fe-b361-ea3c92c35421",
      "metadata": {
        "id": "320f0cb6-61c9-42fe-b361-ea3c92c35421"
      },
      "source": [
        "Import Python libraries:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "bb5ca67b-607d-4b23-926a-6459ea584f45",
      "metadata": {
        "id": "bb5ca67b-607d-4b23-926a-6459ea584f45"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "import csv\n",
        "import weaviate\n",
        "import json\n",
        "from weaviate.connect import ConnectionParams\n",
        "from weaviate.classes.config import Configure\n",
        "from typing import List\n",
        "import numpy as np\n",
        "from dotenv import load_dotenv"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "15f64563-f932-4a38-bd96-5b9d5cfadfd3",
      "metadata": {
        "id": "15f64563-f932-4a38-bd96-5b9d5cfadfd3"
      },
      "source": [
        "Load data from a CSV file for inserting data into a Weaviate collection:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "013284ff-e4b6-4ad7-b330-17860121c4c1",
      "metadata": {
        "id": "013284ff-e4b6-4ad7-b330-17860121c4c1"
      },
      "outputs": [],
      "source": [
        "with open('/content/dataset.csv') as csv_file:\n",
        "    books = [*csv.DictReader(csv_file)]"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "df7eb305-6f3e-4215-8090-71d044a302aa",
      "metadata": {
        "id": "df7eb305-6f3e-4215-8090-71d044a302aa"
      },
      "source": [
        "Define a Weaviate connection, it requires an API Key for authentication:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "13427782-ab29-4acc-9b24-59cf1f287619",
      "metadata": {
        "id": "13427782-ab29-4acc-9b24-59cf1f287619"
      },
      "outputs": [],
      "source": [
        "load_dotenv()\n",
        "auth_config = weaviate.auth.AuthApiKey(api_key=os.getenv(\"APIKEY\"))\n",
        "client = weaviate.WeaviateClient(\n",
        "    connection_params=ConnectionParams.from_params(\n",
        "        http_host=os.getenv(\"WEAVIATE_ENDPOINT\"),\n",
        "        http_port=\"8080\",\n",
        "        http_secure=False,\n",
        "        grpc_host=os.getenv(\"WEAVIATE_ENDPOINT\"),\n",
        "        grpc_port=\"50051\",\n",
        "        grpc_secure=False,\n",
        "    ),\n",
        "    additional_headers={\n",
        "        \"X-Palm-Api-Key\": os.getenv(\"PALM_APIKEY\")\n",
        "    },\n",
        "    auth_client_secret=auth_config\n",
        ")\n",
        "client.connect()"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "69af7764-5477-4e72-b0d2-0cdcc7127443",
      "metadata": {
        "id": "69af7764-5477-4e72-b0d2-0cdcc7127443"
      },
      "source": [
        "Create or recreate a collection \"Book\". Weaviate will vectorize all book descriptions using Vertex AI embedding model:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "030e257a-b3ee-435a-9be2-64e3ed08b8cf",
      "metadata": {
        "id": "030e257a-b3ee-435a-9be2-64e3ed08b8cf"
      },
      "outputs": [],
      "source": [
        "if client.collections.exists(\"Book\"):\n",
        "    client.collections.delete(\"Book\")\n",
        "collection = client.collections.create(\n",
        "    name=\"Book\",\n",
        "      vectorizer_config=[\n",
        "        Configure.NamedVectors.text2vec_palm(\n",
        "            name=\"description_vector\",\n",
        "            source_properties=[\"description\"],\n",
        "            project_id=os.getenv(\"GOOGLE_CLOUD_PROJECT\"),\n",
        "            model_id=\"text-embedding-005\"\n",
        "        )\n",
        "    ],\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "933e7a3d-843e-4dd1-9ab0-a4405947fc50",
      "metadata": {
        "id": "933e7a3d-843e-4dd1-9ab0-a4405947fc50"
      },
      "source": [
        "Insert data into the Weaviate collection:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "10ff90db-52f8-43f8-9ba2-615578b603f8",
      "metadata": {
        "id": "10ff90db-52f8-43f8-9ba2-615578b603f8"
      },
      "outputs": [],
      "source": [
        "with collection.batch.dynamic() as batch:\n",
        "    for i, doc in enumerate(books):  # Batch import data\n",
        "        print(f\"importing book: {i+1}\")\n",
        "        batch.add_object(properties=doc)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9d0ca596-9688-4df3-a8cc-dc384c1e5234",
      "metadata": {
        "id": "9d0ca596-9688-4df3-a8cc-dc384c1e5234"
      },
      "source": [
        "Define the Weaviate query function. Weaviate converts the text query into an embedding, runs a vector search and displays results.\n",
        "\n",
        "It prints each result separated by a line of dashes, in the following format :\n",
        "\n",
        "- Title: Title of the book\n",
        "- Author: Author of the book\n",
        "- Publish date: Book publication date\n",
        "- Description: As stored in your document's description metadata field"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "7d1cae5f-ffa3-44ea-8b9e-fd376cdc185c",
      "metadata": {
        "id": "7d1cae5f-ffa3-44ea-8b9e-fd376cdc185c"
      },
      "outputs": [],
      "source": [
        "def handle_query(query, limit):\n",
        "    result = (\n",
        "        collection.query.near_text(\n",
        "            query=query,\n",
        "            limit=limit\n",
        "        )\n",
        "    )\n",
        "    for hit in result.objects:\n",
        "        book = hit.properties\n",
        "        print(\"Title: {}, Author: {}, Publish date: {}\".format(book[\"title\"], book[\"author\"], book[\"publishDate\"]))\n",
        "        print(book[\"description\"])\n",
        "        print(\"---------\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "84d4af63-4527-4e3d-9f1b-5d0cf112caf2",
      "metadata": {
        "id": "84d4af63-4527-4e3d-9f1b-5d0cf112caf2"
      },
      "source": [
        "Run the query `drama about people and unhappy love`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "365ac7bc-f06c-4db9-81f9-67354fe18e44",
      "metadata": {
        "id": "365ac7bc-f06c-4db9-81f9-67354fe18e44"
      },
      "outputs": [],
      "source": [
        "handle_query(\"drama about people and unhappy love\", 2)"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.11.0rc1"
    },
    "colab": {
      "provenance": []
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}
